Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Add code
Nov 06, 2025
Viaarxiv icon

LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks

Add code
Oct 30, 2025
Viaarxiv icon

BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization

Add code
Oct 31, 2025
Viaarxiv icon

Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning

Add code
Oct 30, 2025
Viaarxiv icon

Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs

Add code
Oct 30, 2025
Viaarxiv icon

Multi-Stage Field Extraction of Financial Documents with OCR and Compact Vision-Language Models

Add code
Oct 27, 2025
Viaarxiv icon

ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs

Add code
Oct 26, 2025
Viaarxiv icon

olmOCR 2: Unit Test Rewards for Document OCR

Add code
Oct 22, 2025
Viaarxiv icon

Metadata Extraction Leveraging Large Language Models

Add code
Oct 22, 2025
Viaarxiv icon

C2T-ID: Converting Semantic Codebooks to Textual Document Identifiers for Generative Search

Add code
Oct 22, 2025
Figure 1 for C2T-ID: Converting Semantic Codebooks to Textual Document Identifiers for Generative Search
Figure 2 for C2T-ID: Converting Semantic Codebooks to Textual Document Identifiers for Generative Search
Figure 3 for C2T-ID: Converting Semantic Codebooks to Textual Document Identifiers for Generative Search
Figure 4 for C2T-ID: Converting Semantic Codebooks to Textual Document Identifiers for Generative Search
Viaarxiv icon